Workflow

After adapter removal with Cutadapt, transcripts were quantified with Kallisto. Integrated normalization and differential expression analysis was conducted with Sleuth following standard procedure as outlined in the manual. For sample metadata, see config/samples.tsv.

Click the nodes to obtain details about each step.

Differential gene expression

Differential transcript expression

Expression Matrices

Expression Plots

Fragment length distribution

GO term enrichment analysis

Gene set enrichment analysis

Heatmaps

IHW

Other

PCA

Pathway enrichment analysis

QC

Statistics

If the workflow has been executed in cluster/cloud, runtimes include the waiting time in the queue.

Configuration

Configuration files
File Code
config/config.yaml
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
samples: config/samples.tsv
units: config/units.tsv

resources:
  ref:
    transcriptome: "ngs-test-data/ref/transcriptome.chr21.fa"
    # species needs to be an identifier known to biomart, e.g. mmusculus, hsapiens
    species: hsapiens
    # this is the version of the bioconda package `bioconductor-org.{species}`.eg.db` that
    # you want -- this needs to be compatible with the versions `r-base` and the
    # bioconductor packages specified e.g. in `envs/` files `fgsea.yaml`, `spia.yaml` and
    # `ens_gene_to_go.yaml`
    species_db_version: "3.10"
  ontology:
    # gene ontology to download, used e.g. in goatools
    gene_ontology: "http://current.geneontology.org/ontology/go-basic.obo"

pca:
  labels:
    # columns of sample sheet to use for PCA
    - condition

scatter:
  # for use as diagnostic plots
  # all samples are compared in pairs to assess their correlation
  # scatter plots are only created if parameter 'activate' is set to 'true'
  activate: true

diffexp:
  # samples to exclude (e.g. outliers due to technical problems)
  exclude:
  # model for sleuth differential expression analysis
  models:
    model_X:
      full: ~condition + batch_effect
      reduced: ~batch_effect
      # Binary valued covariate that shall be used for fold change/effect size
      # based downstream analyses.
      primary_variable: condition
  # significance level to use for volcano, ma- and qq-plots
  sig-level:
    volcano-plot: 0.05
    ma-plot: 0.05
    qq-plot: 0.05

enrichment:
  goatools:
    # tool is only run if set to `true`
    activate: true
    fdr_genes: 0.05
    fdr_go_terms: 0.05
  fgsea:
    gene_sets_file: "ngs-test-data/ref/dummy.gmt"
    # tool is only run if set to `true`
    activate: true
    # if activated, you need to provide a GMT file with gene sets of interest
    fdr_gene_set: 0.05
    nperm: 10000
  spia:
    # tool is only run if set to `true`
    activate: true
    # pathway database to use in SPIA, needs to be available for
    # the species specified by resources -> ref -> species above
    pathway_database: "panther"

bootstrap_plots:
  # desired false discovery rate for bootstrap plots, i.e. a lower FDR will result in fewer boxplots generated
  FDR: 0.01
  # maximum number of bootstrap plots to generate, i.e. top n discoveries to plot
  top_n: 3
  color_by: condition
  # for now, this will plot the sleuth-normalised kallisto count estimations with kallisto
  # for all the transcripts of the respective genes
  genes_of_interest:
    - A4galt

plot_vars:
  # significance level used for plot_vars() plots
  sig_level: 0.1

params:
  kallisto: "-b 100"
  # these cutadapt parameters need to contain the required flag(s) for
  # the type of adapter(s) to trim, i.e.:
  # * https://cutadapt.readthedocs.io/en/stable/guide.html#adapter-types
  #   * `-a` for 3' adapter in the forward reads
  #   * `-g` for 5' adapter in the forward reads
  #   * `-b` for adapters anywhere in the forward reads
  # also, separate capitalised letter flags are required for adapters in
  # the reverse reads of paired end sequencing:
  # * https://cutadapt.readthedocs.io/en/stable/guide.html#trimming-paired-end-reads
  cutadapt-se: ""
  # reasoning behind parameters:
  #   * `--minimum-length 33`:
  #     * kallisto needs non-empty reads in current versions (fixed for future releases:
  #       https://github.com/pachterlab/kallisto/commit/64fe837ca86f3664496483bcd2787c9376584fed)
  #     * kallisto default k-mer length is 31 and 33 should give at least 3 k-mers for a read
  #   * `-e 0.005`: the default cutadapt maximum error rate of `0.2` is far too high, for Illumina
  #     data the error rate is more in the range of `0.005` and setting it accordingly should avoid
  #     false positive adapter matches
  #   * `--minimum-overlap 7`: the cutadapt default minimum overlap of `5` did trimming on the level
  #     of expected adapter matches by chance
  cutadapt-pe: "-a ACGGATCGATCGATCGATCGAT -g GGATCGATCGATCGATCGAT -A ACGGATCGATCGATCGATCGAT -G GGATCGATCGATCGATCGAT --minimum-length 33 -e 0.005 --overlap 7"

Loading...